15 research outputs found

    Prompt Engineering a Prompt Engineer

    Full text link
    Prompt engineering is a challenging yet crucial task for optimizing the performance of large language models (LLMs). It requires complex reasoning to examine the model's errors, hypothesize what is missing or misleading in the current prompt, and communicate the task with clarity. While recent works indicate that LLMs can be meta-prompted to perform automatic prompt engineering, their potentials may not be fully untapped due to the lack of sufficient guidance to elicit complex reasoning capabilities in LLMs in the meta-prompt. In this work, we investigate the problem of "prompt engineering a prompt engineer" -- constructing a meta-prompt that more effectively guides LLMs to perform automatic prompt engineering. We introduce and analyze key components, such as a step-by-step reasoning template and context specification, which lead to improved performance. In addition, inspired by common optimization concepts such as batch size, step size and momentum, we introduce their verbalized counterparts to the meta-prompt and investigate their effects. Our final method, named PE2, finds a prompt that outperforms "let's think step by step" by 6.3% on the MultiArith dataset and 3.1% on the GSM8K dataset. To demonstrate its versatility, we apply PE2 to the Instruction Induction benchmark, a suite of counterfactual tasks, and a lengthy, real-world industrial prompt. In these settings, PE2 achieves strong performance and outperforms prior automatic prompt engineering baselines. Further, we show that PE2 makes meaningful and targeted prompt edits, amends erroneous or incomplete prompts, and presents non-trivial counterfactual reasoning abilities

    MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

    Full text link
    Recently, there has been a rapid advancement in research on Large Language Models (LLMs), resulting in significant progress in several Natural Language Processing (NLP) tasks. Consequently, there has been a surge in LLM evaluation research to comprehend the models' capabilities and limitations. However, much of this research has been confined to the English language, leaving LLM building and evaluation for non-English languages relatively unexplored. There has been an introduction of several new LLMs, necessitating their evaluation on non-English languages. This study aims to expand our MEGA benchmarking suite by including six new datasets to form the MEGAVERSE benchmark. The benchmark comprises 22 datasets covering 81 languages, including low-resource African languages. We evaluate several state-of-the-art LLMs like GPT-3.5-Turbo, GPT4, PaLM2, and Llama2 on the MEGAVERSE datasets. Additionally, we include two multimodal datasets in the benchmark and assess the performance of the LLaVa-v1.5 model. Our experiments suggest that GPT4 and PaLM2 outperform the Llama models on various tasks, notably on low-resource languages, with GPT4 outperforming PaLM2 on more datasets than vice versa. However, issues such as data contamination must be addressed to obtain an accurate assessment of LLM performance on non-English languages.Comment: 23 pages, 30 figures and 1 tabl

    MEGA: Multilingual Evaluation of Generative AI

    Full text link
    Generative AI models have shown impressive performance on many Natural Language Processing tasks such as language understanding, reasoning, and language generation. An important question being asked by the AI community today is about the capabilities and limits of these models, and it is clear that evaluating generative AI is very challenging. Most studies on generative LLMs have been restricted to English and it is unclear how capable these models are at understanding and generating text in other languages. We present the first comprehensive benchmarking of generative LLMs - MEGA, which evaluates models on standard NLP benchmarks, covering 16 NLP datasets across 70 typologically diverse languages. We compare the performance of generative LLMs including Chat-GPT and GPT-4 to State of the Art (SOTA) non-autoregressive models on these tasks to determine how well generative models perform compared to the previous generation of LLMs. We present a thorough analysis of the performance of models across languages and tasks and discuss challenges in improving the performance of generative LLMs on low-resource languages. We create a framework for evaluating generative LLMs in the multilingual setting and provide directions for future progress in the field.Comment: EMNLP 202

    Saynis 1

    No full text
    Buuggan ujeeddada ugu weyn ee laga leeyahay waxaa weeye in ardayda lagu abuuro xirfad ay degaankooda ku fiirsadaan, kuna baaraan iyo in la siiyo aqoon sayniseed oo saldhig u noqon karta barashada sayniseed ee mustaqbalka._-_Questo libro ha lo scopo principale di sviluppare nello studente l’abilità di osservare e analizzare l’ambiente circostante e di fornirgli gli strumenti base per l’apprendimento scientifico nel futuro._-_A book aiming to develop observational and analytical skills for environmental studies

    Saynis. Buugga 4

    Get PDF
    Buuggan ugu ujeeddada weyn ee laga leeyahay waxay tahay, in ardayga lagu abuuro xirfado uu degaankooda ku fiirsadaan kuna baataan, iyo in la siiyo aqoon sayniseed oo saldhig u noqon karta barashada sayniseed ee mustaqbalka._-_Lo scopo principale di questo libro è quello di sollecitare l’alunno a osservare ed analizzare l’ambiente in cui vive e fornirgli delle conoscenze scientifiche che possano essere di base per l’apprendimento scientifico futuro._-_ A book aiming to develop observational and analytical skills for environmental studies

    Saynis. Buugga 3

    No full text
    Ujeeddada Waxbarashada caafimaadka laga lahaa waaxay ahayd in la beddelo asluubta ardayda si ay u gutaan xilka caafimaadka iyo kaa bulshada ay la nool yihiin._-_L’obiettivo dell’educazione alla salute è quello di modificare il comportamento degli alunni in modo che adempiano alla propria responsabilità di badare alla salute propria e a quella della società che li circonda._-_Health education aims to change people's behaviour for the better, for the sake of individuals and of the sorrounding society

    Saynis - Buugga 2

    No full text
    Buuggan oo loogu talagalay ardayda ku jirta fasalka labaad ee dugsiyada baraymariga, wuxuu ahaa mid ku fadhiya habka cusub ee carruurta Sayniska loo baro. Wuxuu ilmaha darrensiinayay in uu saynusku yahay barasho la xiriirta waxyaalaha agagaarkiisa ah e uu maalin walba arko ama maqlo._-_Libro destinato agli alunni della seconda elementare, basato su metodi nuovi per l'insegnamento delle scienze ai bambini. Fa capire ai bambini che l'apprendimento delle scienze è legato al mondo circostante, che ogni giorno si trovavano a vedere e sentire._-_Science book for the second class of primary schools using new methods for teaching science to children, based on the observation of the environment

    Saynis. Buugga 3

    Get PDF
    Ujeeddada Waxbarashada caafimaadka laga lahaa waaxay ahayd in la beddelo asluubta ardayda si ay u gutaan xilka caafimaadka iyo kaa bulshada ay la nool yihiin._-_L’obiettivo dell’educazione alla salute è quello di modificare il comportamento degli alunni in modo che adempiano alla propria responsabilità di badare alla salute propria e a quella della società che li circonda._-_Health education aims to change people's behaviour for the better, for the sake of the individuals and of the sorrounding society

    Waxbarashada Caafimaadka. Buugga 3

    No full text
    Buuggan saddexaad oo loogu talaggalay ardayda dugsiyada hoose waxaa ku daabacan sawirro iyo sharraxaad muujinaya habka uu ardaygu u guto xilka caafimaadka iyo kan bulsha uu la noolyahay._-_Manuale per alunni delle scuole primarie; include lezioni di igiene e sui comportamenti essenziali per la salute e la convivenza con la comunità con cui vive l'alunno._-_Manual meant for students of the primary schools; it teaches hygienic rules and the essential behaviour to keep healthy and live with people in the students' community

    Xisaab 3

    No full text
    Buug xisaab oo swirro fiican leh oo loogu talaggalay ardayda fasalka saddexaad ee dugsiyada hoose: tirada kumanaal, jajabka, cabbiraadda._-_Manuale di matematica e geometria destinato ad alunni della terza classe della scuola elementare: le migliaia, frazioni, unità di misura._-_Mathematics and geometry textbook meant for third class students of primary schools: thousands, fractions, units of measure